48 research outputs found

    NAS-X: Neural Adaptive Smoothing via Twisting

    Full text link
    We present Neural Adaptive Smoothing via Twisting (NAS-X), a method for learning and inference in sequential latent variable models based on reweighted wake-sleep (RWS). NAS-X works with both discrete and continuous latent variables, and leverages smoothing SMC to fit a broader range of models than traditional RWS methods. We test NAS-X on discrete and continuous tasks and find that it substantially outperforms previous variational and RWS-based methods in inference and parameter recovery

    Learning Hard Alignments with Variational Inference

    Full text link
    There has recently been significant interest in hard attention models for tasks such as object recognition, visual captioning and speech recognition. Hard attention can offer benefits over soft attention such as decreased computational cost, but training hard attention models can be difficult because of the discrete latent variables they introduce. Previous work used REINFORCE and Q-learning to approach these issues, but those methods can provide high-variance gradient estimates and be slow to train. In this paper, we tackle the problem of learning hard attention for a sequential task using variational inference methods, specifically the recently introduced VIMCO and NVIL. Furthermore, we propose a novel baseline that adapts VIMCO to this setting. We demonstrate our method on a phoneme recognition task in clean and noisy environments and show that our method outperforms REINFORCE, with the difference being greater for a more complicated task

    Relation between stress heterogeneity and aftershock rate in the rate-and-state model

    Get PDF
    We estimate the rate of aftershocks triggered by a heterogeneous stress change, using the rate-and-state model of Dieterich [1994].We show that an exponential stress distribution Pt(au) ~exp(-tautau_0) gives an Omori law decay of aftershocks with time ~1/t^p, with an exponent p=1-A sigma_n/tau_0, where A is a parameter of the rate-and-state friction law, and \sigma_n the normal stress. Omori exponent p thus decreases if the stress "heterogeneity" tau_0 decreases. We also invert the stress distribution P(tau) from the seismicity rate R(t), assuming that the stress does not change with time. We apply this method to a synthetic stress map, using the (modified) scale invariant "k^2" slip model [Herrero and Bernard, 1994]. We generate synthetic aftershock catalogs from this stress change.The seismicity rate on the rupture area shows a huge increase at short times, even if the stress decreases on average. Aftershocks are clustered in the regions of low slip, but the spatial distribution is more diffuse than for a simple slip dislocation. Because the stress field is very heterogeneous, there are many patches of positive stress changes everywhere on the fault.This stochastic slip model gives a Gaussian stress distribution, but nevertheless produces an aftershock rate which is very close to Omori's law, with an effective p<=1, which increases slowly with time. We obtain a good estimation of the stress distribution for realistic catalogs, when we constrain the shape of the distribution. However, there are probably other factors which also affect the temporal decay of aftershocks with time. In particular, heterogeneity of A\sigma_n can also modify the parameters p and c of Omori's law. Finally, we show that stress shadows are very difficult to observe in a heterogeneous stress context.Comment: In press in JG

    The Neural Testbed: Evaluating Joint Predictions

    Full text link
    Predictive distributions quantify uncertainties ignored by point estimates. This paper introduces The Neural Testbed: an open-source benchmark for controlled and principled evaluation of agents that generate such predictions. Crucially, the testbed assesses agents not only on the quality of their marginal predictions per input, but also on their joint predictions across many inputs. We evaluate a range of agents using a simple neural network data generating process. Our results indicate that some popular Bayesian deep learning agents do not fare well with joint predictions, even when they can produce accurate marginal predictions. We also show that the quality of joint predictions drives performance in downstream decision tasks. We find these results are robust across choice a wide range of generative models, and highlight the practical importance of joint predictions to the community
    corecore